Àá½Ã¸¸ ±â´Ù·Á ÁÖ¼¼¿ä. ·ÎµùÁßÀÔ´Ï´Ù.
KMID : 0644020190320030047
Journal Of Korean Medical Classics
2019 Volume.32 No. 3 p.47 ~ p.57
Comparison of Word Extraction Methods Based on Unsupervised Learning for Analyzing East Asian Traditional Medicine Texts
Oh Jun-Ho

Abstract
Objectives : We aim to assist in choosing an appropriate method for word extraction when analyzing East Asian Traditional Medical texts based on unsupervised learning.

Methods : In order to assign ranks to substrings, we conducted a test using one method(BE:Branching Entropy) for exterior boundary value, three methods(CS:cohesion score, TS:t-score, SL:simple-ll) for interior boundary value, and six methods(BExSL, BExTS, BExCS, CSxTS, CSxSL, TSxSL) from combining them.

Results : When Miss Rate(MR) was used as the criterion, the error was minimal when the TS and SL were used together, while the error was maximum when CS was used alone. When number of segmented texts was applied as weight value, the results were the best in the case of SL, and the worst in the case of BE alone.

Conclusions : Unsupervised-Learning-Based Word Extraction is a method that can be used to analyze texts without a prepared set of vocabulary data. When using this method, SL or the combination of SL and TS could be considered primarily.
KEYWORD
Text segmentation, Word extraction, tokenization, East Asian Traditional Medicine, Korean medicine
FullTexts / Linksout information
Listed journal information
ÇмúÁøÈïÀç´Ü(KCI)